A new approach to the analysis and annotation of speech and prosody based on computerized cross-linguistic corpora

نویسنده

  • Dolores Ramírez Verdugo
چکیده

In the present paper, corpus linguistics becomes a valuable methodological tool for cross-linguistic research on speech and prosody. The inherent complexity of speech analysis and prosodic annotation increases when the object of study is a longitudinal computerized corpus of native and nonnative varieties of English. The lack of generally accepted prosodic transcription systems adds further difficulty to the task. The fact that the most popular transcription models such as INTSINT or ToBI are intended for Standard varieties of a language led us propose a multidimensional level of annotation which has proved to be effective in identifying the non-native speakers’ main prosodic characteristics and their implications in the discourse. I present a new approach to the acoustic, and discourse analysis of two computerized corpora of non-native and English native language speakers (460 hours of spoken language, 3.32.400 words). These parallel corpora belong to an on-going longitudinal research project: the UAM Corpus of Spoken English as a Second Language, funded by Autonomous Community of Madrid (CAM, 06/0027/2001). I survey and annotate the prosodic patterns used by both native and non-native language speakers aiming at describing the extent to which the intonation systems used by non-native speakers may affect the information structure and the discourse meaning of their messages. I propose new levels of annotation as Figures 1 and 2 illustrate. NSS // 1 But/ last No/ vember/ wasn’t/ cold// //1Was it//

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Design and Evaluation of Shared Prosodic Annotation for Spontaneous French Speech: From Expert Knowledge to Non-Expert Annotation

In the area of large French speech corpora, there is a demonstrated need for a common prosodic notation system allowing for easy data exchange, comparison, and automatic annotation. The major questions are: (1) how to develop a single simple scheme of prosodic transcription which could form the basis of guidelines for non-expert manual annotation (NEMA), used for linguistic teaching and researc...

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Consistency Maintenance in Prosodic Labeling for Reliable Prediction of Prosodic Breaks

For the implementation of the prosody prediction model, large scale annotated speech corpora have been widely applied. Reliability among transcribers, however, was too low for successful learning of an automatic prosodic prediction. This paper reveals our observations on performance deterioration of the learning model due to inconsistent tagging of prosodic breaks in the established corpora. Th...

متن کامل

Fully automatic segmentation for prosodic speech corpora

While automatic methods for phonetic segmentation of speech can help with rapid annotation of corpora, most methods rely either on manually segmented data to initially train the process or manual post-processing. This is very time-consuming and slows down porting of speech systems to new languages. In the context of prosody corpora for text-to-speech (TTS) systems, we investigated methods for f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Procesamiento del Lenguaje Natural

دوره 31  شماره 

صفحات  -

تاریخ انتشار 2003